# Large-scale Pre-training

Gemma 2 Llama Swallow 2b It V0.1
The Gemma-2-Llama-Swallow series is built through continued pre-training of the gemma-2 model, significantly enhancing Japanese language processing capabilities while retaining original English proficiency.
Large Language Model Transformers Supports Multiple Languages
G
tokyotech-llm
61
1
LHM 1B
Apache-2.0
LHM is a feed-forward model capable of reconstructing an animatable 3D human body from a single image within seconds.
3D Vision English
L
3DAIGC
169
1
Simcse Roberta Base Zh
MIT
SimCSE (Supervised Version) is a Chinese sentence similarity calculation model based on supervised learning, optimizing sentence embeddings through contrastive learning.
Text Embedding Transformers Chinese
S
hellonlp
30
1
MERT V1 95M
MERT-v1-330M is an advanced music understanding model trained based on the MLM paradigm, with 330M parameters, supporting a 24K Hz audio sampling rate and 75 Hz feature rate, suitable for various music information retrieval tasks.
Audio Classification Transformers
M
m-a-p
83.72k
32
Whisper Large V2 Hindi 2.5k Steps
Apache-2.0
This is a Hindi automatic speech recognition (ASR) model fine-tuned based on OpenAI Whisper Large V2, trained on the Common Voice 11.0 dataset with a word error rate (WER) of 10.05%.
Speech Recognition Transformers Other
W
DrishtiSharma
52
2
Exp W2v2t Ja Unispeech S569
Apache-2.0
A Japanese automatic speech recognition model fine-tuned using the Common Voice 7.0 (Japanese) dataset, based on the microsoft/unispeech-large-1500h-cv model
Speech Recognition Transformers Japanese
E
jonatasgrosman
14
0
Erlangshen MegatronBert 1.3B Sentiment
Apache-2.0
A Chinese sentiment analysis model based on the MegatronBert architecture, fine-tuned on multiple sentiment analysis tasks
Text Classification Transformers Chinese
E
IDEA-CCNL
93
20
Erlangshen Roberta 330M NLI
Apache-2.0
A fine-tuned version based on the Chinese RoBERTa-wwm-ext-large model on multiple natural language inference task datasets
Text Classification Transformers Chinese
E
IDEA-CCNL
468
5
Deberta Large
MIT
DeBERTa is an improved BERT model that enhances performance through a disentangled attention mechanism and an enhanced masked decoder, surpassing BERT and RoBERTa in multiple natural language understanding tasks.
Large Language Model Transformers English
D
microsoft
15.07k
16
English Model
An English fine-tuned speech recognition model based on facebook/wav2vec2-large, using the Common Voice dataset, supporting 16kHz sampled audio input.
Speech Recognition Transformers
E
tanmayplanet32
30
0
Chinese Bigbird Wwm Base 4096
Apache-2.0
Chinese pre-trained model based on the BigBird architecture, employing Whole Word Masking (WWM) strategy, supporting a 4096-length context window
Large Language Model Transformers Chinese
C
Lowin
13
3
Xls R 1b Cv 8 Fr
Apache-2.0
This is a French automatic speech recognition model fine-tuned on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - FR dataset based on facebook/wav2vec2-xls-r-1b.
Speech Recognition Transformers French
X
Plim
26
0
Deberta Large Mnli Zero Cls
MIT
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, surpassing BERT and RoBERTa in multiple natural language understanding tasks by improving the attention mechanism and masked decoder.
Large Language Model Transformers English
D
Narsil
51.27k
14
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase